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Abstract —This paper introduces a new method to soive the 
cross-domain recognition prohiem. Different from the traditionai 
domain adaption methods which reiy on a giobai domain shift 
for aii ciasses between source and target domain, the proposed 
method is more iiexibie to capture individuai ciass variations 
across domains. By adopting a naturai and wideiy nsed assump¬ 
tion - “the data sampies from the same ciass shouid iay on 
a iow-dimensionai snbspace, even if they come from different 
domains”, the proposed method circnmvents the iimitation of 
the giobai domain shift, and soives the cross-domain recognition 
by finding the compact joint subspaces of source and target 
domain. Specificaiiy, given iabeied sampies in source domain, we 
construct subspaces for each of the ciasses. Then we constrnct 
subspaces in the target domain, caiied anchor subspaces, by 
coiiecting uniabeied sampies that are ciose to each other and 
highiy iikeiy aii faii into the same ciass. The corresponding 
ciass iabei is then assigned by minimizing a cost fnnction which 
reflects the overiap and topoiogicai structure consistency between 
subspaces across source and target domains, and within anchor 
subspaces, respectiveiy. We further combine the anchor subspaces 
to corresponding sonrce snbspaces to constrnct the compact joint 
subspaces. Subsequentiy, one-vs-rest SVM ciassiflers are trained 
in the compact joint snbspaces and appiied to nniabeied data 
in the target domain. We evaiuate the proposed method on two 
wideiy used datasets: object recognition dataset for computer 
vision tasks, and sentiment ciassification dataset for natnrai 
ianguage processing tasks. Comparison resnits demonstrate that 
the proposed method ontperforms the comparison methods on 
both datasets. 

Index Terms —Unsupervised, cross domain recognition, com¬ 
pact joint subspace. 


I. Introduction 

M any machine learning methods often assume that the 
training data (labeled) and testing data (unlabeled) are 
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from the same feature space and following similar distribu¬ 
tions. However this assumption may not be true in many real 
applications. Namely the training data is obtained from one 
domain, while the testing data is coming from a different do¬ 
main. As a visual example, Figure [T] shows coffee-mug images 
collected from four different domains (Amazon, Caltech256, 
DSLR and Webcam), which present different image resolu¬ 
tions (Webcam vs DSLR), viewpoints (Webcam vs Amazon), 
background complexities (Amazon vs Caltech256) and object 
layout patterns, etc. 



Fig. 1. Sample images from four different domains: Amazon, Caltech256, 
DSLR and Webcam. 

On the other hand, the samples also show different distri¬ 
butions in the feature space, as illustrated in Figure |2] The 
2-D plots are the first two feature dimensions reduced from 
original 800 dimensional SURF feature space (described in 
Section lIV-Bl i. by using PCA. Figure |2] (a) and (b) show 
the distributions of “mouse” and “mug”, and “monitor” and 
“projector” in different domains, respectively. It is clear to 
see that the samples from different domains have different 
distributions. Moreover, the relations between two classes 
in different domains are also different. Taking Figure |2] (a) 
for example, in “Webcam” domain, the mug samples (black 
crosses) usually locate at the right-top side of the mouse ones 
(black circles); but in “Amazon” domain, the mug samples 
(red crosses) usually locate at the left side of the mouse ones 
(red circles). 
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Fig. 2. Illustrations of sample distributions of different domains in feature space. The 2-D plots are the first two feature dimensions reduced from original 800 
dimensional SURF feature space (describe in Section HV-Bt . by using PCA. (a) The distributions of “mouse” and “mug” in four domains, (b) The distributions 
of “monitor” and “projector” in four domains. Best viewed in color. 


These domain differences lead to a dilemma that 1) directly 
applying the classifiers trained from one domain to another 
may result in significant degraded performance m; 2) labeling 
data in each domain as training samples would be very 
expensive, especially in large-scale applications. The dilemma 
consequently poses the cross-domain recognition problem, 
namely how to utilize the labeled data in a source domain 
to classify/recognize the unlabeled data in a target domain. 

To achieve cross-domain recognition, a number of Domain 
Adaption (DA) methods have been developed to adapt the 
classifier from one domain to another 0. The subspace based 
DA has been found to be very effective to handle cross-domain 
problem 0, H), 0, S, Q, iSl, 0. They either constructed 
a set of intermediate subspaces for modeling the shifts between 
domains 0, a, a, 0, or generated a domain-invariant 
subspace in which the data from source and target domains 
can represent each other well a, 0,0. All these methods 
mentioned above utilize the data from each domain all together 
to generate a single subspace for each domain. In practice, 
however, the intrinsic feature shift of each class may not be 
exactly the same. The existing methods can obtain a global 
domain shift, but ignore the individual class difference across 
domains. 

To circumvent the limitation of the global domain shift, we 
adopt a natural and widely used assumption that “the data 
samples from the same class should lay on a low-dimensional 
subspace, even if they come from different domains nni”. 
This assumption is not only held on many computer vision 
tasks, such as face recognition under varying illumination im 
and handwritten digit recognition im, but also used as a 
human cognitive mechanism for visual object recognition ini. 
Note that this assumption does not mean that the target data 
samples exactly lay on the subspace of the source samples, 
since different domains show subspace shift 0 . Figure |3]gives 
an illustration of a compact joint subspace covering source 
domain and target domain for a specific class. The source and 


target subspaces have the overlap which implicitly represents 
the intrinsic characteristics of the considered class. They have 
their own exclusive bases because of the domain shift, such 
as the varying illumination or changing the view perspectives. 

Based on the above assumption, we propose a new method 
that solves the cross-domain recognition by finding the com¬ 
pact joint subspaces of source and target domain. Specifically, 
given labeled samples in source domain, we construct sub¬ 
spaces for each of the classes. Then we construct subspaces 
in the target domain, called anchor subspaces, by collecting 
unlabeled samples that are close to each other and highly likely 
all fall into the same class. The corresponding class label is 
then assigned by minimizing a cost function which reflects 
the overlap and and topological structure consistency between 
subspaces across source and target domains, and within an¬ 
chor subspaces, respectively. We further combine the anchor 
subspace to corresponding source subspaces to construct the 
compact joint subspaces for each class. Subsequently, the 
SVM classifier is trained by using the samples in the compact 
joint subspace and applied to the unlabeled data in the target 
domain for classification. 

The contributions of this paper are; 1) by assuming that 
the data samples from one specific class, even though they 
come from different domains, should lay on a low dimensional 
subspace, we generate one compact joint subspace for each 
class independently. Each compact joint subspace carries the 
information not only about the intrinsic characteristics of the 
corresponding class, but also about the specificity for each 
domain. 2) To construct the compact joint subspaces, we 
first generate anchor subspaces in the target domain, assign 
labels to them, and combine these anchor subspaces to the 
corresponding source subspaces. 3) We propose a cost function 
that implicitly maximizes the overlap between source subspace 
and target subspace for each class as well as maintains the 
topological structure in the target domain. We use principal 
angles as the subspace distances in the cost function instead 
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of data-sample distances that were usually used in previous 
methods. 


Target Domain Subspace 




Source Domain Subspace 


Compact 

Joint 

Subspace 


Exclusive Bases in Source Overlap Bases Exclusive Bases in Target 


Fig. 3. An illustration of a compact joint subspace between source and target 
domains for a specific class. This compact joint subspace consists of overlap 
bases between domains, which represent the intrinsic characteristics of this 
class implicitly, and exclusive bases of different domains, which represent the 
exclusive characteristics for each domain. 


Note that proposed method does not need to get the or¬ 
thogonal bases for the subspaces; instead, we use the data 
samples themselves as the over-complete bases to represent 
the subspace implicitly. 

The remainder of this paper is organized as follows. We 
elaborate related works in Section m Section m describes 
the proposed method. Quantitative experimental results are 
demonstrated in Section |IV] Section |V] concludes the paper. 

II. Related Work 

For the cross-domain recognition problem, Domain Adapta¬ 
tion is the most closely related work which is known as a type 
of fundamental methods in machine learning and computer 
vision. Here, we give a brief review of this topic. Please refer 
to lfT4ll for a comprehensive survey. 

The traditional DA algorithms can be categorized into two 
types, i.e., (semi-)supervised domain adaptation and unsuper¬ 
vised domain adaptation, based on the availability of labeled 
data from the target domain. (Semi-)supervised DA assumes 
that there are some labeled data available in the target domain. 
Daume ITSl proposed to map the data from both source and 
target domains to a high dimensional feature space, and trained 
the classifiers in this new feature space. Saenko et al. m 
proposed a metric learning approach that can adapt labeled 
data of few classes from the target domain to the unlabeled 
target classes. In ini, the authors proposed a co-regularization 
model that augmented feature space to jointly model source 
and target domains. In iflSl . Chen et al. proposed a co-training 
based domain adaptation method. They first train an initial 
category model on samples from source domain and then 
use it for labeling samples from target domain. The category 
model keeps updating by the newly labeled target samples 
through co-training. Pan and his colleagues im analyzed 
the transfer component that maps both of domains on kernel 
space to preserve some properties of domain-specific data 
distributions. Duan et al. Il20ll proposed a SVM based method 
which minimized the mismatches between source and target 
domains, using both labeled and unlabeled data. Shekhar et 
al. Q proposed to learn a single dictionary to represent both 
source and target domains. The work in II 2 TII proposed to use 
a linear transformation map features from the target domain 


to the source domain and generate the classification model 
training on the source domain and target samples based on 
feature transformation. Motivated by the recent success of 
deep learning, some hierarchical domain adaptation methods 
are also proposed ||22l, which need the large scale data to 
(pre-)train the deep neural network model. In 1^ . Xiao et 
al. proposed a semi-supervised kernel matching based domain 
adaptation method that learns a prediction function on the 
source domain while mapping the target samples to similar 
source samples by matching the target kernel matrix to the 
source kernel matrix. 

Unsupervised DA, on the other hand, does not use any label 
information in the target domain, which is also considered as 
more challenging and more useful in real-world applications. 
In El, El, Gopalan et al. constructed a set of intermediate 
subspaces along the geodesic path that links the source and 
target domains on the Grassmann manifold. In E), Gong et 
al. proposed a geodesic flow kernel to model shift between 
the source and target domains. In ||6l, a new intermediate sub¬ 
spaces construction method was proposed, which constructed 
the subspaces by gradually reducing the reconstruction error of 
the target data instead of using the manifold walking strategies. 
Jhuo et al. El learned a transformation so that the source 
samples can be represented by target samples in a low-rank 
way. Fernando et al. proposed to learn a mapping function 
which aligns the sample representations from source and target 
domains ESl . Tommasi et al. proposed a naive Bayes 
nearest neighbor-based domain adaptation algorithm that it¬ 
eratively learns a class metric while inducing a large margin 
separation among classes for each sample. Baktashmotlagh et 
al. EtII proposed to use the Riemannian metric as a measure of 
distance between the distributions of source and target domain. 
In in. Long et al. proposed to learn a domain invariant 
representation by jointly performing the feature matching 
and instance weighting. Cui et al. El treated samples from 
each domain as one points (i.e., covariance matrices) on a 
Riemannian manifold, and then interpolate some intermediate 
points along the geodesic, which are used to bridge the two 
domains. 

The algorithm proposed in this paper is an unsupervised 
cross-domain recognition method. It is different from the 
traditional domain adaption methods due to it constructs the 
low dimensional compact joint subspaces for each class inde¬ 
pendently, which will avoid the global domain shift limitation 
and capture the individually domain variations for each class. 

III. Proposed Model 

A. Problem 

Suppose there are two sets of data samples, one from source 
domain S, denoted as {Xf and the other one 

from target domain T, denoted as where 

d is the data dimension. Ns and Nt denote the number of 
data samples in source and target domains, respectively. The 
labels of all data samples in source domain, denoted as = 
{yflfci £ are known, where C is the number of 

classes, yf £ {0,1}*^ is a C bit binary code of the Ah data 
sample in source domain. If this data sample belongs to class 
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j, the jth bit of yf is 1 and all other bits are 0. Our aim is to 
estimate Y'^ S the labels of all the data samples in 

the target domain. 

B. Overview of the proposed method 


(a) (b) 

Source Domain Target Domain 



Classifiers 


Fig. 4. The overview of the proposed model, (a). Subspaces for each class 
in source domain. The bars with the same color denote the bases of one 
class.(b). Anchor subspaces construction. The points in target domain are the 
data samples. Data samples in each circle denote a core subgroup and they 
construct an anchor subspace as one row of black bars. (c). Compact joint 
subspaces construction. The ellipses denote the bases from anchor subspaces 
(target domain). Best viewed in color. 

The proposed algorithm aims to construct a set of compact 
joint subspaces which cover source and target 

domains, one for each of C classes, and then train the 
classifiers on these compact joint subspaces. As shown in 
Figure |2 since a compact joint subspace is constituted by 
a source subspace and a target subspace, we need to construct 
the source and target subspaces first. Hence, the proposed 
algorithm consists of five steps; 

Constructing subspaces in source domain. We construct a 
set of subspaces one for each class in the source 

domain. As mentioned above, we do not care about how to 
get a set of orthogonal bases to represent a subspace. For each 
class, we simply construct a subspace by using over-complete 
bases, i.e., take all the data that belong to this subspace. Hence, 
each source subspace Mf = {Xj\xs^(;^.}, where Ci denotes 
the ith class, illustrated in Figure. IM^)- 

Constructing anchor subspaces in target domain. To 
estimate the target subspace for each class, we construct a 
number of anchor subspaces in the target domain, denoted 
as as illustrated in Figure. |4jb). These anchor 

subspaces are expected: 1) carry the information of target 
exclusive bases and 2) construct target subspace compactly. 
The data samples in target domain are naturally satisfy the 
first expectation. To satisfy the second expectation, the data 
samples in one anchor subspaces should be as much as 
possible from the same class, since the subspace constructed 
by samples from different class is usually less compact than 
the one constructed by samples from the same class. Thus, the 
basic idea is to ensure that each anchor subspace only contains 


data samples from a single class such that it can be combined 
to a source subspace for constructing the joint subspace. Since 
the data in the target domain are unlabeled, we construct 
anchor subspaces by grouping target data samples with high 
similarities. This is motivated by the locality principle - a data 
sample usually lies in close proximity to a small number of 
samples from the same class 1281 . 

Labeling the anchor subspaces. Since the compact joint 
subspaces are constructed independently for each class, we 
assign a label for each anchor subspace. For this purpose, 
we propose a cost function that reflects 1) the cross-domain 
distance between the anchor subspace and the corresponding 
source subspaces; and 2) the within-domain topological re¬ 
lation of the anchor subspaces in the target domain. Shorter 
cross-domain distance between the corresponding subspaces 
actually reflects the desirability of more common bases in 
constructing a compact joint subspace. Therefore, the min¬ 
imization of the proposed cost function implicitly reflects 
the maximization of the overlap between source and target 
subspace. 

Constructing compact joint subspaces. We construct com¬ 
pact joint subspace Mf^ = {Mf ^ where Ci 

denotes the ith class, as illustrated in Figure. Etc). As men¬ 
tioned before, we simply take all the data samples from all the 
involved subspaces as the over-complete bases in constructing 
the joint subspaces. 

Training classifiers on the compact joint subspaces. We 
train one-vs-rest linear SVM classifiers for each class using 
the labeled data in the compact joint subspace. And then we 
apply the linear SVM classifier to the unlabeled data in the 
target domain for classification. 

C. Anchor subspaces obtained in target domain 

We construct each anchor subspace by selecting one target 
data sample and combining it with its nearest neighbors. This 
way, the obtained compact group of data samples are likely 
to be from the same class 1^ . Specifically, we first apply 
the iT-means algorithm to cluster all the target data into a 
large number of Z groups. We set Z — where 7 is the 
desired average group size. In each group of data, we And a 
compact core subgroup consisting of a small number of N 
samples, e.g., N = 5, which are taken for constructing an 
anchor subspace. In this paper, the core subgroup for the group 
L is constructed by following two steps. 

1) Estimate the center of the core subgroup by finding the 
data sample x* to 

Yl \\x-yh, ( 1 ) 

y€Ar(N-i)0 

where denotes the N — 1 nearest neighbors 

of X in L. 

2) Take x* U 7\/(Ar_i)(x*) as the core subgroup for con¬ 
structing an anchor subspace. 

For the groups that contain less than N data samples, we do 
not construct an anchor subspace. 
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D. Labeling each anchor subspace 

Note that, we have constructed C subspaces in the source 
domain, one for each class, denoted as Their 

corresponding labels, denoted as Y = {yi}f^i G 
which is an identity matrix, i.e., the ith bit of j/i is 1 and all 
the other bits of yi are 0. In this section, we developed a new 
strategy to assign class labels Y' = for the 

K anchor subspaces constructed in target domain. 

This strategy includes two main components: the similarity 
between subspaces and the cost function for subspace label 
assignment. 

1) Distance between subspaces: To calculate the distance 
between two subspaces, principal angles are usually used 
||2^ . Principle angles between subspaces, which often dehne 
between two orthonormal subspaces, serve as a classic tool in 
many areas of in computer science, such as computer vision 
and machine learning. 

We follow the definition in Il29l . given two orthonormal 
matrices Mi and M 2 , the principal angles 0 < 0i < ■ ■ ■ 6m < 
7 r /2 between two subspaces span{Mi) and span{M 2 ) is 
dehned by: 

cos 9k = max max UkVk 

Uk€span(]VIi) Vk€span(]Vl 2 ) 

S.f. UkUk = 1, VkVk = 1, 

UfcUi = 0, v(,Vi = 0, (i = 1, • • ■ , k - 1). 

The principal angles are related to the geodesic distance 
between Mi and M 2 as ll29l . 

In practice, the principal angles are usually computed 
from the singular value decomposition (SVD) of M{M 2 , 
i.e., M{M 2 = U(cosQ)V', where U = 

V = [vi,-- - ,Vn], and cos0 is the diagonal matrix 

diay ^COS^I ■ • * COsdmin{m^n)^ • 

In this paper, as shown in the later Section IIII-D2I we 
need both cross-domain subspace distance and within-domain 
subspace distance to dehne the cost function for anchor 
subspaces labeling. For the cross-domain subspace distance, 
e.g. between subspace Mf with Si data samples in the source 
domain and Mj with tj data samples in the target domain, 
we hrst orthogonalize both of them to obtain A4f and A4j^, 
and then calculate the distance as: 

min(ti ,Sj) 

D{Mf,Mj)^ Y, sin0„ 

where di come from the SVD of M.J that 

{Mf)' Mj = U{cose)V'. 

Similarly, for the within-domain subspace distance, e.g., 
between two target (anchor) subspaces Mj^ and Mj that both 
of them have N data samples we dehned in Section IIII-DI we 
hrst orthogonalize both of them to get Mj and Mj, and then 
calculate the distance as: 

N 

D(M:^,Mf)^^sin 0 ', 

where 0' come from the SVD of {MJ)' Mj that 

{Mj)' Mj = C7(cos0)V'. 


The above dehned subspace distance follows the assumption 
that the samples within the same class share the same subspace 
even though they are from different domains. Consequently, 
the distance between subspaces across source and target do¬ 
mains of a specihc class trends to smaller than that between 
different classes. We will show quantitative comparison in 
Section IIV-AI to demonstrate this advantage. 

We further generate two affinity matrices, C x K matrix 
to rehect the distances between K anchor subspaces 
and C source subspaces, and K x K matrix to rehect 
the pairwise distances among K anchor subspaces. More 

specihcally, we have [i,j) — exp I —- 




and 


= exp ( - 


d{mI ,mJ) 


2 ) Cost function and optimization: Two important issues 
are considered in assigning a label to each anchor subspace: 
1) the distance between an anchor subspace and the same-label 
source subspace should be small, and 2) the local topological 
structures in the target domain should be preserved IMIl , i.e., 
anchor subspaces with shorter distance are more preferable to 
be assigned to the same class. Considering these two issues, 
we propose the cost function as follow: 


+ 

i=l j=l j=lj' = l 

(3) 

where and are the affinity matrices of inter-domain 
subspace pairs and subspace pairs within target domain, re¬ 
spectively. 

Adding a constant term WUi - into C 

and splitting the hrst term into two parts, we get: 


2=1 j — 1 j — ^ 

+ lly* - 

j=ij>=i i=i j=i 

(4) 


where Rj is the ijth element of the identity matrix /. 

Note that the hrst and second terms are equal. The cost 
function C can be further written as: 

C+KC+K 

=J2J2 11^* - y^i = 1 . (5) 

i=i j=i 


where y = [V, F'], A = 


We also 


I 

L 4 {AY^ pA^T 
relax the constraint to this cost function by only requiring the 
sum of each row in to be 1. 

By including the constraint term, the cost function can be 
written in a matrix form ED 

c{y, A) = Tr {yAy'^) + ^ _ ^2^ 

( 6 ) 


where A = D—C is the Laplacian matrix of A. D is the degree 
matrix which is a diagonal matrix with Da = ^ Aij . A G 

j^C+K 

is the Lagrange multiplier. To minimize the objective 
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function £, we separate it into two steps to update its two 
unknowns y and A alternately, using the following two steps; 

Step 1: Having A fixed, optimize y by computing the 
derivative of £ with the respect to y and setting it to be zero: 

= 0^yA + lX^ + - 11 ^) = 0 . ( 7 ) 

oy 

Note that y contains two parts Y and Y', with Y is known. 
Thus, to solve it, as in ll^ , we first split the Laplacian matrix 
A into 4 blocks along the Cth row and column: 


A = 


Acc Ack 
Akc Akk 


Similarly, we separate A into two parts: 


M = [Ai, A 2 , • • • , Ac]"'" and A 2 = [Ac+i, Ac+ 2 , ■ • ■ , Ac+ic]"''- 


Then Y' can be updated by solving the following equation; 
y'(fe+i) A^^ + pllTr'(fe+i) = ^iiT _ yAck - 

(8) 

The solution is given by the Lyapunov equation With this 
solution, can be achieved by putting Y and y'(fc+i) 

together as = [F, 

Step 2: Having y fixed, perform a gradient ascending 
update with the step of /r on Lagrange multipliers as: 

A(fe+i) = - 1 ) . (9) 

To initialize this optimization process, we simply set 

to be zero, and set the maximal number of iteration 
maxiter to be 10000. This whole algorithm is summarized 
in Algorithm. [T] 


Algorithm 1 Labeling the Anchor Subspaces. 

Input: Affinity matrix A, maxiter, labels Y for 
Output: Labels Y' for M^. 


Initialization: A(°\ y'(o) ^ero 
1; while (not converged & not achieve the maxiter) 

2: Update the y'(fe+i) solving the following equation; 

y/(fc+i) + /.ill'^y'<'=+^^ = ^ 11 '^ - YAck - 

3: Update = [Y,Y'(’=+% 

4: Update A('=+i) = A('=) + /r ^ ' 

5; Check the convergence. 

6; end while 


Since we only require the sum of each row in 3/" to be 1, 
after we get Y', we set the bit with the maximal value in each 
row to 1 and all the other bits to 0. 

IV. Experimental Results 

In this section, we hrst give the evaluation results on the 
subspace distance that we used in the proposed algorithm to 
demonstrate that this distance is suitable for our algorithm. 
Then we evaluate the proposed algorithm comprehensively 
on two widely used cross domain recognition datasets: ob¬ 
ject recognition image dataset for computer vision tasks and 
sentiment classification dataset for natural language processing 
tasks. 


A. Evaluation on the distance we used in proposed algorithm 

The distance matrix in Table HI is given to show that the 
subspace based distance is suitable for our method. In this 
table, we use the data from object recognition dataset (details 
in Section HV-Bb . Each column denotes the distance between 
a specific class (C1,--,C10) in source domain and a class 
(Cl,- • - ,€10) in target domain. Note that all the numbers are 
the average of all 12 pairs of the source and target domain 
(describe in Section lIV-Bll i. We can see that the distances, 
across two domains, between the same class are relatively 
smaller than those between different classes. Thus, the results 
also demonstrate the assumption that the samples with the 
same class share the same subspace even though they are 
from different domains, i.e., the distance between subspaces 
across source and target domains of a specific class trends to 
be smaller than that between different classes. 

TABLE I 

The distance matrix between classes across source and target 

DOMAINS. 



Cl 

C2 

C3 

C4 

C5 

C6 

C7 

C8 

C9 

CIO 

Cl 

0.86 

0.92 

0.92 

0.91 

0.91 

0.90 

0.90 

0.91 

0.91 

0.91 

C2 

0.92 

0.85 

0.92 

0.90 

0.94 

0.92 

0.93 

0.92 

0.93 

0.94 

C3 

0.92 

0.92 

0.85 

0.92 

0.90 

0.90 

0.91 

0.92 

0.92 

0.92 

C4 

0.91 

0.90 

0.92 

0.87 

0.92 

0.90 

0.90 

0.91 

0.91 

0.91 

C5 

0.91 

0.94 

0.90 

0.92 

0.86 

0.90 

0.91 

0.92 

0.92 

0.92 

C6 

0.90 

0.92 

0.90 

0.90 

0.90 

0.86 

0.88 

0.90 

0.92 

0.90 

Cl 

0.90 

0.93 

0.91 

0.90 

0.91 

0.88 

0.87 

0.90 

0.92 

0.90 

C8 

0.91 

0.92 

0.92 

0.91 

0.92 

0.90 

0.90 

0.91 

0.92 

0.92 

C9 

0.91 

0.93 

0.92 

0.91 

0.92 

0.92 

0.92 

0.92 

0.91 

0.92 

CIO 

0.91 

0.94 

0.92 

0.91 

0.92 

0.90 

0.90 

0.92 

0.92 

0.90 


B. Cross-domain recognition on object dataset 

The first dataset that we evaluate on is an image dataset. 
The whole dataset has four sub-datasets, which we use as 
four domains, with 2533 images from 10 classes in total. The 
first three sub-datasets were collected by ifThll . which include 
images from amazon.com (Amazon), collected with a digital 
SLR (DSLR) and a webcam (Webcam). The fourth domain 
is Caltech-256 dataset (Caltech) ll34ll . Eollowing the way of 
feature extraction for each image in ||3, we first use a SURE 
llTSl detector to extract points of interest from each image. 
We then randomly select a subset of the points of interest 
and quantize their descriptors to 800 visual words using the 
iT-means clustering. Einally, we construct a 800-dimensional 
feature vector for each image using the bag-of-visual-words 
technique. Eor simplicity, hereafter we use “A”, “C”, “D” 
and “W” to denote the “Amazon”, “Caltech”, “DSLR” and 
“Webcam” domains, respectively. 

1) Single source domain and single target domain: We 
report the results on all twelve possible pairs of source- and 
target-domain combinations, followed. We ran our algorithm 
20 times for each object-recognition task and gave the average 
accuracy rate (%) and standard deviation (%) in Table. In] We 
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TABLE II 

Results of single source and target domain on the object recognition dataset. denotes that there is no resualt reported 

BEFORE. 


Model 

C^A 

C^-W 

C^-D 

A-i-C 

A-i-W 

A-i-D 

K-svD m 

20.5±0.8 

- 

19.83=1.0 

20.23=0.9 

16.93=1.0 

- 

SGF (3] 

48.9±0.7 

42.9±0.8 

44.03=1.0 

40.03=0.3 

35.03=0.7 

34.93=0.6 

GFK (4] 

40.4±0.7 

35.8±1.0 

41.13=1.3 

37.93=0.4 

35.73=0.9 

35.23=0.9 

Metric 1161 

33.7±0.8 

- 

35.03=1.1 

36.03=1.0 

21.73=0.5 

- 

ITL OH 

49.2±0.6 

43.3±0.7 

44.43=1.2 

38.53=0.4 

40.03=1.3 

39.63=0.6 

SI © 

45.4±0.3 

37.0±5.1 

42.33=0.4 

40.43=0.5 

37.93=0.9 

32.13=4.5 

SA{SVM) (25] 

46.1 

38.9 

39.4 

39.9 

39.6 

38.8 

DASC (D 

49.8±0.4 

45.4±0.9 

48.53=0.8 

39.13=0.3 

37.73=0.7 

39.33=0.8 

TJM (3 

46.8 

39.0 

44.6 

39.5 

42.0 

45.2 

CJS (ours) 

59.1±1.2 

S2.2±2.6 

53.03=3.5 

47.63=1.1 

42.23=2.9 

47.93=2.2 

Model 

W^-C 

W-i-A 

W^-D 

D-i-C 

D^-A 

D^-W 

K-SVD (m 

13.2±0.6 

14.2±0.7 

- 

- 

14.33=0.3 

46.83=0.8 

SGF (3) 

32.3±0.4 

35.1±0.5 

72.93=0.7 

34.93=0.3 

34.73=0.4 

82.03=0.6 

GFK (4) 

29.3±0.4 

35.5±0.7 

71.23=0.9 

32.73=0.4 

36.13=0.4 

79.13=0.7 

Metric 1161 

32.3±0.8 

38.6±0.8 

- 

- 

30.33=0.8 

55.63=0.7 

ITL (37] 

32.2±0.3 

35.2±0.3 

75.63=0.8 

34.73=0.3 

39.63=0.4 

83.63=0.5 

SI © 

36.3±0.3 

38.3±0.3 

79.53=2.0 

35.53=1.8 

39.13=0.5 

86.23=1.0 

SA{SVM) (25] 

31.8 

39.3 

77.9 

35.0 

42.0 

82.3 

DASC © 

33.3±0.3 

36.3±0.4 

71.23=0.9 

32.73=0.4 

36.53=0.3 

88.33=0.4 

TJM (9] 

30.2 

30.0 

89.2 

31.4 

32.8 

85.4 

CJS (ours) 

33.5±L6 

39.53=1.3 

89.43=1.8 

34.53=1.9 

37.93=1.6 

89.33=1.7 


compare our algorithm with nine other methods, including K- 
SVD Eg), SGF 0, GFK 0, Metric Ugl, ITL 133, SI El, 
SA ED, DASC M and TJM ID. Their results in Table. M 
are obtained from previous papers, mostly by the original 
authors. It can be seen that our algorithm performs best in 
9 out 12 domain pairs. In particular, in four domain pairs our 
algorithm significantly outperforms (by more than 5%) than all 
the comparison methods, i.e., C^-A, C—>D, A—and C^W. 
Our algorithm shows a comparable performance with the best 
performed method in the other three domain pairs. Note that 
the “Metric” method ifThl is a semi-supervised method. 

2) Multiple source/target domains: We then evaluate the 
performance when there are multiple source/target domains. To 
get the fair comparison with other method, we also directly get 
the results from the previous literature. Thus, we only conduct 
the multiple source/target domains cross-domain recognition 
on six possible different source- and target-domain combina¬ 
tions, followed ll24l . among which three combinations include 
two source domains and one target domain, and the other 
three combinations include one source domain and two target 
domains. When there are multiple source/target domains, we 
simply merge the data samples in all the source/target domains 
as a single domain. 

When there are multiple source domains, we report the re¬ 
sults of four comparison methods, including SGF 10, RDALR 
0, FDDL m, SDDL 0, HMP iHl and the model in |2l, 
as shown in Table. |ml For SGF 0, we report its perfor¬ 
mance under both unsupervised and semi-supervised settings. 
Note that RDALR, FDDL and SDDL are all semi-supervised 
methods, while our proposed method is unsupervised. It is 


clearly to see that the proposed method outperforms all the 
comparison methods significantly. 

In principle, using multiple source domains should provide 
more information for each class, which should result in 
higher performance than using a single source domain. For 
example, for the domain combination of “W, D—^A” (41.3%), 
it shows a marginal performance improvement over the single¬ 
source domain cases, “D—s-A” (38.5%) and “W^-A”(39.1%). 
In practice, however, we do not always achieve higher per¬ 
formance when using multiple source domains. For example, 
comparing the results from Tables |II] and [Till the performance 
of “D, A—7>W” (73.2%) lies in between the performances 
of two single-source domain cases “D^W” (89.5%) and 
“A—^W”(42.4%). This is an interesting problem to be studied 
in our future work. 

When there are multiple target domains, we only find two 
comparison methods, SGF 0 and the method from ll24ll . Both 
unsupervised and semi-supervised settings were developed for 
SGF, and the model from ll24ll is the semi-supervised based 
method. We take the performance from 0 and ED, and 
include that in Table. lYl It can be seen that the proposed 
algorithm performs better than both settings of SGF and two 
out of three cases of model from Ell¬ 
in principle, using multiple target domains does provide 
more information, since the labels are only available in the 
source domain. Accordingly, the performance of using multi¬ 
ple target domains should be the weighted (based on numbers 
of data samples in each involved target domain) average of 
the performance of using each target domain separately. The 
results in Tables El and |IV] are largely aligned with this 
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TABLE III 

The results of multi-source domain adaptation on the object 

RECOGNITION DATASET. NOTE THAT ALL THE COMPARISON METHODS 
ARE SEMI-SUPERVISED DOMAIN ADAPTATION METHOD EXCEPT THE “US” 
MARK ONE. 


Model 

D, A-l^W 

A, W-l^D 

W, D^A 

SGF (3 (US) 

31.0±1.6 

25.0±0.4 

15.0±0.4 

SGF 01 

52.0±2.5 

39.0±1.1 

28.0±0.8 

RDALR (3 

36.9±1.1 

31.2±1.3 

20.9±0.9 

FDDL 01 

41.0±2.4 

38.4±3.4 

19.0±1.2 

SDDL (3 

57.8±2.4 

56.7±2.3 

24.1±1.6 

HMP 01 

47.2±1.9 

51.3±1.4 

37.3±1.4 

Gopalan et al. 1241 

51.3 

36.1 

35.8 

CJS (ours) 

73.2±2.5 

81.3±1.3 



TABLE IV 

The RESULTS OF MULTI-TARGET DOMAIN ADAPTATION ON THE 2D 
OBJECT RECOGNITION DATASET. “SS” AND “US” DENOTE THE 
SEMI-SUPERVISED AND UNSUPERVISED SETTING, RESPECTIVELY. 


Model 

W^A,D 

D^>A,W 

A^>D,W 

SGF 01 (US) 

28.0±1.9 

35.0±1.7 

22 . 0 ± 0.2 

SGF 01 (SS) 

42.0±2.8 

46.0±2.3 

32.0±0.9 

Gopalan et al. 1241 (SS) 

44.0 

49.5 

30.0 

CJS (ours) 

45.1±1.2 

48.4±2.2 

44.2±2.0 


3) Performance under different parameter settings: There 
are two main parameters in the proposed algorithm; 1) the 
desired average size of each group constructed in the target 
domain using the K-means algorithm, i.e., 7; 2) the number 
of data samples in each anchor subspace, i.e., N. In order to 
investigate the sensitivity to different parameter settings, we 
tune each of the two parameters respectively, and report the 
performance of each parameter setting. For each parameter 
setting, we report the accuracy rate by percentage, for eight 
combinations of single source- and target- domain, and six 
combinations of multiple source/target domains. 

We take the value of 7 in the range of 5 to 30 with the 
step length of 5, and the results are shown in left sub-figure 
in figure. |5] We can see that, the performance only varies in 
a small range for almost all the domain combinations, except 
for ”D,A^W”. 

For TV, we take its value in the range of 3 to 10 with the step 
length of 1 and the results are reported in right sub-figure in 
Figure. |5] It is clear to see that the performance only varies in 
a small range for almost all the domain combinations, except 
for ”D^W” and ”D,A^W” when N > 8. 

Therefore, we can conclude that the performance of the 
proposed algorithm is not very sensitive to the two parameters 
7 and N. In our experiments, we consistently set 7 to be 20 
and N to be 5. 



7 

(a) 



(b) 

Fig. 5. The results when setting varies values for 7 , which determines the 
number of groups obtained by the -means algorithm. The results when 
setting varied values for N. 

C. Cross-domain recognition on sentiment classification 
dataset 

Although the proposed algorithm is originally designed for 
vision tasks, it can be easily utilized for cross domain tasks 
in other areas. In this section, as an example, we compare 
the proposed algorithm with other 7 methods in a domain 
adaptation task from the natural language processing area. 

In this task, customers’ reviews on four different products 
(kitchen applications, DVDs, books and electronics) are col¬ 
lected as four domains Il40ll . Each review consists of comment 
texts and a rating from 0 to 5. Reviews with rating higher than 
3 are classified as positive samples, and the remaining reviews 
are classified as negative samples. In total, there are 1000 
positive reviews and 1000 negative reviews in each domain. 
The goal of this task is to adapt the classifier training on 
one domain and use it for classifying data samples in another 
domain. 

We follow the same experiment setup described in BTI . In 
each domain, 1600 reviews including 800 positive reviews and 
800 negative reviews, are used as the training set, and the rest 
400 reviews are used as the testing set. We extract unigram 
and bigram features on the comment texts, and the feature 
dimension is reduced to 400. Finally, each comment text is 
represented by a 400-dimensional feature using the bag-of- 
words technique. 

We conduct experiments on four pairs of source- and target- 
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domain combinations. The same experiment has been also 
conducted in BTIl . The performance is reported in Table |Vl It 
is clear to see that overall the proposed algorithm outperforms 
other 7 methods. From this, we can see that the proposed 
algorithm can also be used for domain adaption problems in 
non-vision areas. 

TABLE V 

Domain adaptation results on the sentiment classieication. K: 

KITCHEN, D: DVD, B: books, E: ELECTRONICS 


Model 

K^>D 

D-i^B 

b^e 

E-i>K 

TCA (13 

60.4 

61.4 

61.3 

68.7 

SGF (3) 

67.9 

68.6 

66.9 

75.1 

FGK 0 

69.0 

71.3 

68.4 

78.2 

SCL (42) 

72.8 

76.2 

75.0 

82.9 

KMM (43) 

72.2 

78.6 

76.9 

83.5 

Metric Il6l 

70.6 

72.0 

72.2 

77.1 

Landmark 1411 

75.1 

79.0 

78.5 

83.4 

CJS (ours) 

77.8 

77.0 

83.2 

84.1 


V. Conclusion 

This paper introduces a new subspace based domain adap¬ 
tation algorithm. The compact joint subspace is independently 
constructed for each class, which covers both source and target 
domains. The compact joint subspace carries the information 
not only about the intrinsic characteristics of the considered 
class, but also about the specificity for each domain. Classifiers 
are trained on these compact joint subspaces. The proposed 
algorithm has been evaluated on two widely used datasets. 
Comparison results show that the proposed algorithm outper¬ 
forms several existing methods on both datasets. 
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